Skip to content

Devise and test a heuristic to infer the IPA language code for a given lang#489

Merged
joanise merged 3 commits into
mainfrom
dev.ej/test-ipa-heuristic
Apr 27, 2026
Merged

Devise and test a heuristic to infer the IPA language code for a given lang#489
joanise merged 3 commits into
mainfrom
dev.ej/test-ipa-heuristic

Conversation

@joanise
Copy link
Copy Markdown
Member

@joanise joanise commented Apr 24, 2026

PR Goal?

This work is triggered by EveryVoice currently not supporting sal-apa as an input language, because with get an InvalidLanguageCode exception from sal-apa-ipa. The correct IPA code is actually sal-ipa.

We don't have a rule at the moment deterministically saying how to derive the IPA language code from an input language code, but our convention happens to be either lang_id+"-ipa" or, if not found, lang_id[:3]+"-ipa". The test case added in this PR formalizes that henceforth by asserting it to be so in unit testing.

Fixes?

While not actually fixing EveryVoiceTTS/EveryVoice#789, this PR makes sure the solution I'm going to propose for that bug will keep working in the future.

Feedback sought?

careful analysis of get_ipa_lang_code in test_langs.py: do you have a better solution?

I would have preferred to add a proper function to g2p, but I want my solution to work on past and future versions of g2p, so that's not really possible. Instead, I'm formalizing via unit tests what I'm going to assume is true of the g2p library in EveryVoice.

Priority?

normal

Tests added?

yup

How to test?

pytest g2p/tests/test_langs.py  -v

Confidence?

medium

Version change?

no

@joanise joanise requested a review from roedoejet April 24, 2026 18:25
Comment thread g2p/tests/test_langs.py

assert (
error_count == 0
), f'g2p mapping errors found, look for "{error_prefix}" above for detail.'
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_io() above is not changed except for outdenting it from a TestSuite class method to a pytest style test function.

Determining the correct -ipa language code given an input language code
is unfortunately not as straightforward as we'd like. This commit adds a
proposed heuristic function and tests it, to make sure this heuristic
remains 100% correct in the future.

Also a number of apparently unrelated changed to please mypy.
@joanise joanise force-pushed the dev.ej/test-ipa-heuristic branch from c188a89 to 9d2a7c1 Compare April 24, 2026 20:22
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 24, 2026

CLI load time: 0:00.07
Pull Request HEAD: d21d9b06014af6b8b6285ba7fe84d2813fe11f61
Imports that take more than 0.1 s:
import time: self [us] | cumulative | imported package

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

joanise added a commit to EveryVoiceTTS/EveryVoice that referenced this pull request Apr 24, 2026
using the technique documented and tested in
NRC-ILT/g2p#489

Fixes #789
joanise added a commit to EveryVoiceTTS/EveryVoice that referenced this pull request Apr 24, 2026
using the technique documented and tested in
NRC-ILT/g2p#489

Fixes #789
joanise added a commit to EveryVoiceTTS/EveryVoice that referenced this pull request Apr 27, 2026
using the technique documented and tested in
NRC-ILT/g2p#489

Fixes #789
@joanise
Copy link
Copy Markdown
Member Author

joanise commented Apr 27, 2026

@roedoejet PR updated. Note that critical review of this PR is important, because if we don't like get_ipa_code() some day, we won't get to change our mind about it given the promises I'm making here.

Copy link
Copy Markdown
Collaborator

@roedoejet roedoejet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - thanks @joanise

@joanise joanise merged commit f1b4249 into main Apr 27, 2026
7 checks passed
@joanise joanise deleted the dev.ej/test-ipa-heuristic branch April 27, 2026 19:42
joanise added a commit to EveryVoiceTTS/EveryVoice that referenced this pull request Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants